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Abstract 

This paper is concerned with the performance of Orthogonal Matching Pursuit 
(OMP) algorithms applied to a dictionary P in a Hilbert space T-L. Given an element 
f GH, OMP generates a sequence of approximations /„, n = 1,2,..each of which 
is a linear combination of n dictionary elements chosen by a greedy criterion. It is 
studied whether the approximations fn are in some sense comparable to best n term 
approximation from the dictionary. One important result related to this question 
is a theorem of Zhang [8] in the context of sparse recovery of finite dimensional 
signals. This theorem shows that OMP exactly recovers n-sparse signal, whenever 
the dictionary T> satisfies a Restricted Isometry Property (RIP) of order An for some 
constant A, and that the procedure is also stable in under measurement noise. 

The main contribution of the present paper is to give a structurally simpler proof of 
Zhang’s theorem, formulated in the general context of n term approximation from a 
dictionary in arbitrary Hilbert spaces T-L. Namely, it is shown that OMP generates 
near best n term approximations under a similar RIP condition. 

AMS Subject Classification: 94A12, 94A15, 68P30, 41A46, 15A52 
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mality, restricted isometry property. 

1 Introduction 

Approximation by sparse linear combinations of elements from a fixed redundant family is 
a frequently employed technique in signal processing and other application domains. We 
consider such problems in a separable Hilbert space T-i endowed with a norm || • || := || • llw 
induced by the scalar product (•, ■) onT-i x T-L. A countable collection V = {</? 7 } 7 er G T-L 
is called a dictionary if it is complete, i.e., the set of finite linear combinations of elements 
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Excellence Initiative of the German Federal and State Governments, and RWTH Aachen Distinguished 
Professorship, Graduate School AICES. 
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of the dictionary are dense in "H. The simplest example of a dictionary is the set of 
elements of a hxed basis of "H. But our primary interest is in redundant families. In such 
a case, there exists a strict subset of T) that is still a dictionary. A primary example of 
a redundant dictionary is a frame, e.g., any union of a hnite number of bases. Without 
loss of generality we shall always assume that the dictionary P is normalized, i.e., 

||</?.^|| = 1, 7 e r. 

Given such a dictionary P, we consider the class 

= E„(P) := I : #(S') < n| C n > 1. (1.1) 

The elements in are said to be sparse with sparsity n. We dehne 

Mf)H ■= inf 11/ -^11, 

which is called the error of best n-term approximation to / from the dictionary T). 

An important distinction between n term dictionary approximation and other forms 
of approximation, such as approximation from an n dimensional space, is that the set 

is not a linear space since the sum of two elements in is generally not in 
although it is in S 2 n- Thus n-term approximation from a dictionary is an important 
example of nonlinear approximation [3] that reaches into numerous application areas 
such as adaptive PDE solvers, image encoding, or statistical learning. It also serves as 
a performance benchmark in compressed sensing that better captures the robustness of 
compressed sensing than results on exact sparsity recovery [2]. 

While there are many themes in n term dictionary approximation, our interest here is 
in analyzing the performance of greedy algorithms for generating n-term approximations 
to a given target element f E Id. There are numerous papers on this subject. We 
refer the reader to the survey article [6] as a general reference. Our particular interest 
is in understanding what properties of the dictionary V guarantee that these algorithms 
perform similarly to best n-term approximation. 

These algorithms and best n-term approximation have a simple description when the 
dictionary T) is an orthonormal or, more generally, a Riesz basis of Id. In this case, the 
best n-term approximations to a given f Eld are realized by expanding / in terms of the 
basis 

f = (1.2) 

7Gr 

and retaining n terms from this expansion which correspond to the largest (in abso¬ 
lute value) expansion coefficients. The typcial greedy algorithm will construct the same 
approximations. The situation is much less clear when dealing with more general dictio¬ 
naries. 

In the case of general dictionaries, algorithms for generating n-term approximations 
are typically built on some form of greedy selection 

(Pk:=(p^„ k = l,2,..., (1.3) 
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of elements from T> and then using a linear combination of as the n-term 

approximation. The standard greedy algorithm (called the Pure Greedy Algorithm) makes 
the initial selection (pi as any element such that 

(pi := Argmax|(/,(p)|. (1.4) 

This gives the approximation /i := (/, (pi)(pi to / and the residual ri := / — /i. 
Given that ipi,... ,ipk-i have been selected, and an approximation fk_i from Fk_i : = 
span{<y9i,..., ipk-i} has been constructed, the next dictionary element ipk is chosen as the 
best match of the residual 

Tk-i := / - /fc-i, (1.5) 

in the sense that 

:= Argmax|(rfc_i,9?.^)|. (1.6) 

7gr 

There exist different ways of forming the next approximation fk resulting in different 
greedy algorithms. We focus our attention on Orthogonal Matching Pursuit (OMP), which 
forms the new approximation as 

fk := Pkf, (1.7) 

where Pk is the orthogonal projector onto Fk- OMP is also called the Orthogonal Greedy 
Algorithm. More generally, we analyze the Weak Orthogonal Matching Pursuit (WOMP) 
where the choice of (pk is only required to satisfy 

\{rk-i,Pk)\ > Kmax|(rfc_i, (1.8) 

where k g] 0, 1] is a fixed parameter, which is a more easily implemented selection rule in 
practical applications. Once this choice oi pi,... ,ipk is made, then fk is again defined as 
the orthogonal projection onto Fk- 

The main interest of the present paper is to understand what properties of a dictionary 
F guarantee that the approximation rate of WOMP after 0{n) steps is comparable to the 
the best n-term approximation error ct„(/), at least for a certain range n < N. A related 
question, but less demanding, is to understand when WOMP is guaranteed to exactly 
recover / whenever / G in 0{n) steps for a suitable range of n. This is sometimes 
refered to as sparse recovery. Of course, as already mentioned, we know that both of these 
questions have a positive answer for the entire range of n whenever "D is a Riesz basis for 

n. 

To give a precise formulation of the type of performance we seek, we define the concept 
of instance optimality. 

Instance Optimality: We say that the WOMP algorithm satisfies instance optimality 
for n < N, if there are constants A, G > 0, with A an integer, such that the outputs fn 
of WOMP satisfy 

\\f - fAnW < Canif)H: (1.9) 

for n < N. 
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Notice that if (II.9p is satisfied then it implies a positive solution to the sparse recovery- 
problem for the same range of n since cr„(/) = 0 when / is in To obtain results on 
sparse recovery or instance optimality requires structure on the dictionary V. The hrst 
results of this type were obtained under assumptions on the coherence of a dictionary 
V cV. dehned by 


p =/i(p) ;= sup{|((/?,'0)| : ^,^1: ev, ^ 

The hrst results on this general circle of problems centered on sparse recovery. Tropp 
[7] proved that whenever the dictionary has coherence /i < 2 ;^, then n steps of OMP 
recover any / G exactly. 

Concerning instance optimality, we mention that Livschitz [5] proved that whenever 
h ^ then after 2n steps, the OMP algorithm returns f 2 n £ ^ 2 n such that 

||/-/2n|| <3a„(/)«. (1.10) 

A weaker assumption on a dictionary, known as the Restricted Isometry Property 
(RIP), was introduced in the context of compressed sensing [1]. To formulate this property, 
we introduce the notation 

= ( 1 . 11 ) 

7er 

whenever c = (c..),).ygr is a hnitely supported sequence. The dictionary V is said to satisfy 
the RIP of order n G N with constant 0 < 5 < 1 provided 

(1 - (5)||c||^ 2 < ||<I)c|p < (1 + (5 )||c||^2, ||c ||£0 := #(supp c) < n. (1.12) 

Hence this property quantihes the deviation of any subset of cardinality at most n from 
an orthonormal set. We denote by 5n the minimal value of 5 for which this property holds 
and remark that trivially < 5n+i- 

It is well-known that a coherence bound 

/i(P) < (u - 1)-^ (1.13) 

implies the validity of RlP(n) for Sn < {n — l)/i, but not vice versa [7]. 

In [S], Tong Zhang proved that OMP exactly recovers hnite dimensional n-sparse 
signals, whenever the dictionary V satishes a Restricted Isometry Property (RIP) of order 
An for some constant A, and that the procedure is also stable in under measurement 
noise. The main result of the present paper is the following related theorem on instance 
optimality for WOMP. 

Theorem 1.1 Given the weakness parameter k < 1, there exist fixed constants A,C,6*, 
such that the following holds for all n > 0: if V is a dictionary in a Hilbert space H for 
which RIP((A -I- l)n) holds with 5{^A+i)n < d*, then, for any target function f E H, the 
WOMP algorithm returns after An steps an approximation fAn to f that satisfies 

11 / - fAnW < ( 1 . 14 ) 
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The values of A, C, k, and 6* for which the above result holds are coupled. For 
example, it is possible to have a smaller value of A at the price of a larger value of C or 
of a smaller value of 5*. Similarly, a smaller weakness parameter k can be compensated 
by increasing A. 

While the theorem of [H| is not stated in the above form, it can be used to derive 
Theorem 11.11 by interpreting the error of best n-term approximation as a measurement 
noise. In this way, one version of the above result can be derived from [8] for OMP {k = 1) 
with (5* = I and A = 30. Let us mention that Zhang’s theorem is also established in |1], 
with the same proof, but with different constants | and A = 12. 

In what follows, we do not focus on improving the constants, but rather our interest 
is to provide a conceptually more elementary proof for Theorem 11.11 Namely the proof 
for [S] and |1] is based on an induction argument which involves an auxiliary greedy 
algorithm (initialized from a non trivial sparse approximation) in an inner loop. Our 
proof avoids using this auxiliary step. It is also presented in the framework of a possibly 
inhnite dimensional Hilbert space "H. We give the new proof in the following section. We 
then give some observations that can be derived from Theorem 11.11 

In this paper, we shall sometimes use the notation = ((n, for any v ^ H, 

and Ct to denote, for any c = (c.y)..),gr and T C T, the sequence whose entries coincides 
with those of c on T and are 0 otherwise. 


2 Proof of Theorem 11.11 

In this section, we give a proof of Theorem 11.11 We begin with the following elemen¬ 
tary lemma which guarantees the existence of near best n term approximations from a 
dictionary. 

Lemma 2.1 Let V be a dictionary in a Hilbert space H that satisfies RIP{2n). Then, 

(i) the set of all n-term linear combinations from V is closed in Ti. 

(ii) For each f & H, e > 0, and n> 1, there exists a g G T,n such that 

\\f - gW < {^ + £)(^n{f)n- (2.1) 

Proof: To prove (i), we let {g’^)k>o be a sequence of elements from that converges in 
H towards some g eFL. We may write 

/ = = ( 2 . 2 ) 

7Gr 

with II11^0 < n. For any e > 0, there exists K such that 

\\g^-g%<e, k,l>K. (2.3) 

From RIP(2n), it follows that 

lie'’-c^ 11,2 <—^=, (2.4) 

\/l — 


5 














which shows that the sequence {c’^)k>o converges in to some c G In particular, we 
find that 

lim = c^, 7 G r. (2.5) 

/c—)-+oo 

If c.y 7 ^ 0 for more than n values of 7 , we find that ||c ^||£0 > n for k sufficiently large which 
is a contradiction. It follows that g = X] 7 er ^ 7(^7 G 

To prove (ii), let G be such that \\gk - /|| If > 0, then g = gk 

will satisfy (ii) if k is sufficiently large. On the other hand, if (Tn{f) = 0, then g^ /, 
k ^ 00. By (i) f E T,n and so we can take g = f. □ 

2.1 Reduction of the residual 

Our starting point in proving Theorem 1 1.1 1 is the following lemma from [ 8 ] which quantifies 
the reduction of the residuals generated by the WOMP algorithm under the RIP condition. 
In what follows, we denote by 

Rfc := {7i,---,7fc}, (2.6) 

the set of indices selected after k steps of WOMP applied to the given target element 
f eT-L, and denote as before the residual by = / — fk- 

Lemma 2.2 Let {fk)k>o be the sequence of approximations generated by the WOMP al¬ 
gorithm applied to f, and let g = with z supported on a finite set T. Then, ifT is not 
contained in Sk, one has 

Ikfc+iir < Ikfcir - “ 11 -^ “ 

where S := S#(^TuSk) '^b,e corresponding RIP-constant and n g]0,1[ is the weakness 
parameter in the WOMP algorithm. 

For completeness, we recall the proof at the end of this section. It is at this point, we 
depart from the arguments in [ 8 ] with the goal of providing a simpler more transparent 
argument. An immediate consequence of Lemma [2.21 is the following. 

Proposition 2.3 Assume that for a given A > 0 and 6* < 1, RIP((A + l)n) holds with 
d{A+i)n < (R. If 9 = ‘I’z, where z is supported on a set T such that #(T) < n, then for 
any non-negative integers (j, m, L) such that 4f{T \Sj) < m and j + mL < An, one has 

+ 11/ - sf. (2.8) 

Proof: By Lemma [2.21 if = $z where z is supported on a set T such that ff{T) < n, 
then for any non-negative integers (j, m, L) such that ff{T\Sj) < m and j -\- mL < An, 
one has 

( \ mL 

1 - - | 5 *)/™) max{0, llrjip - ||/ - 9IP} 

< max{0, |l7||2-||/-j|R, 

where we have used the fact that #(T \ S';) < m for all I > j, This gives fl2.8p and 
completes the proof of Proposition 12.31 □ 
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Proof of Theorem ll.lt We fix / and use the abbreviated notation 


o-n := crn(/)w, n > 0. 

We first observe that the assertion of the theorem follows from the following. 


(2.9) 


Claim: If 0 < k < n satisfies 


\\rAk\\ < 2 crfc, 


and is such that an < ^, then there exists k < k' <n such that 


fiAk'W < 2 crfe/. 


( 2 . 10 ) 

( 2 . 11 ) 


Indeed, assuming that this claim holds, we complete the proof of the Theorem as 
follows. We let k be the largest integer in {0,...,n} for which Hr^fcll < la^- Since 
Ikoll = CTo = II/II, such a k exists. If /c < n, then we must have ak < 4cr„ and therefore 

I^Anll < \\rAk\\ < 2 afc < 8 an, ( 2 . 12 ) 

so that fll.lip holds with C = 8. 

We are therefore left with proving the claim. For this, we fix 

5 * = ( 2 . 13 ) 

and 0 < k < n such that fl2.10p holds and such that an < ^- Let k < K < n he the first 
integer such that ax < By (ii) of Lemma [2.11 we know that for any B > 1 there is a 
g G T.K with 11/ — g\\ < Baxif)- Therefore, g has the form 

g = <^z = Y^ z^(p^, if{T) = K. ( 2 . 14 ) 

■yeT 

The significance of K is that on the one hand 

\\f - g\\ < Bax < ^(Tk, ( 2 . 15 ) 

while on the other hand 

o-fc < 4 ax_i. ( 2 . 16 ) 

To eventually apply Proposition 12.31 for the above g and j = Ak, we need to bound 
#(T \ SAk) with A yet to be specified. To this end, we write K = k + M, with M > 0, 
and observe that if S' C T is any set with #(5*) = M and gs '■= then 

\\gs\\ > 11 / -{g- ^ 5)11 - 11 / - ^11 >(^k- Bax ^ (1 - 

where we have used the fact that g — gs ^ ^k- Using RIP, we obtain the following lower 
bound for the coefficients of g: for any set S' C T of cardinality M 

(1 - f )h^ < lltef < (1 + r) ^ (2.18) 

yeS yeS 
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Taking for S the set Sg of the M smallest coefficients of g and noting that then for any 
more general S CT with #(S') > M, one has 

6 / 


and hence 


7I1-- 


M 


'A<Y. 

7 SS 


(2.19) 


For the particular set S := T \ SAk, if #(*S') > M, the above bound combined with the 
RIP implies 

(1 - < llssll" < Il9 - fAkV < (Il9 - /II + WrAtW? 


<{B<jk + 2<Jkf < ( f + 2 ) 7 


Since 5* = 1/6 this gives the bound 


HT\SAk)<T 
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7 if + 2 


1 -f 


rM < 13M, 


( 2 . 20 ) 


where the second inequality is obtained by taking B sufficiently close to 1. 

We proceed now verifying the claim with k' = K — 1 when K — 1 > k and with 
k' = k + 1 otherwise. In the first case we can use the reduction estimate provided by 
Proposition 12.31 with j = Ak in combination with fl2.16l) to deal with the term ||r^fc|| in 
(12.81) . When K = k + 1, however, we cannot bound HrAfeU directly in terms of a ai for 
some / > k. Accordingly, we use Proposition 12.31 in different ways for the two cases. 

In the case where M > 2, i.e., K — 1 > k, we apply (12.8p with j = Ak, m = 13M 
and L = Indeed Ak + Lm = Ak + 52M < An holds for k + M < n whenever 

A > 52k~‘^. Moreover, notice that for such an A 

A{K — 1) = Ak + A{M — 1) > Ak + 2 ^^ ~ ~ (2-21) 

whenever 

A>2Q\Ak-^]. ( 2 . 22 ) 

This gives 

<e-“''='||r^,||'^+||/-g||^ 

< + BVJ 

where we have used (12.161) in the fourth inequality, and the last inequality follows by 
taking B sufficiently close to 1. We thus obtain (12.lip for the value k' = K — 1 > k. 

In the case M = 1, i.e., K = k + 1, we apply (12. 8 p with j = Ak, m = 13 and 


L= 




















In fact, from fl2.20D we know that #(T \ SAk) < 13 and An > A{k + 1) > Ak + mL for A 
satisfying 02 . 221 ) . This yields 

lkA(fc+l)|P < WrAk+mlW^ 

<e-^rAkr + \\f-gr 

< 4e-V2 + 

This implies that SA{k+i) contains T. Indeed, if it missed one of the indices 7 G T, then 
we infer from the RIP, 

< 11^ - /A(fc+l)||^ 

<(ll/-^7|| + lkA(.+l)||)^ 

< (^BaK + y/de-s + 

On the other hand, we know from 02.191) that 

6 / i?\ 2 „ „ 

y(l-j) kilt (2.23) 

which for B sufficiently close to 1 is a contradiction since | ^1 —j j > | . 

This implies that ||r^(fe+i)|| < < 7 ^+ 1 , and therefore 02.lip holds for the value k' = k + 
This verifies the claim and hence completes the proof of Theorem 11.11 □ 

Let us observe that Theorem o does not give that /„ is a near-best n-term approxi¬ 
mation in the form 

\\f-fn\\<CoanU)n- (2.24) 

However a simple postprocessing of fAn by retaining its n largest components does satisfy 
( 1221 . 

Corollary 2.4 Under the assumptions of Theorem \l.l[ let Jau = be the output of 

WOMB after An steps. Let T C T, #(T) = n, be a set of indices corresponding to n 
largest entries ofc^'^. Define /* G to be the element obtained by retaining from fAn 
only the n terms corresponding to the indices in T. Then, 

\\f-f*\\<C*aMH, (2.25) 

where the constant C* depends on the constant C in Theorem \l.l\ and on the RIP-constant 

d[A+l)n ■ 

Proof: By Lemma [2.11 there exists a c with ||c ||£0 < n, such that 

||/-$c|| < 2 a 4 /)^. (2.26) 
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It follows that 


1 


C + 2 


|c — c Ih 2 s 


a/I - 5(A+l)n 
If S' = supp(c), we obtain 


|$c — <l>c 


An\ 


p 2 < 


V^l - 5 (a+i), 




n- 


|c —c ^"'||£2 < llc-r — c ^"'||£2 + llcyc — c ^”||£2 + ||c^?|l £2 

< 2||c — c ^"||£2 + ||c 5 r ||£2 

< 3||c - c^"||^ 2 , 


which, by fl2.27p . provides 


1^ „An\\ ^ Qll„ „An\\ ^ 3(C' + 2) ^ i f\ 

|C Crp 11^2 < 3||C C |h2 < ^ ^n\J ) 




n- 


{A+l)n 


The approximation is in and satisfies 


(2.27) 


(2.28) 


(2.29) 


11/ - II < 2an{f)'H + - c)|| < (2 + ?V^i^^^E^)a„(/)^, (2.30) 

which proves fl2.25p . 


(A+l)n 


Proof of Lemma 12.2t We may assume that ||rfc|| > ||/ — 5 f|| otherwise there is nothing 
to prove. First observe now that 

iir.+iir =ii/-wir 

= \\f-Pkfr-\m-Pk+i)ff 

Therefore, it suffices to prove that H^fclp — |(rfc, is bounded by the right hand side 

of fl2.7p which amounts to showing that 

(1 - 5)(lk.|P - 11/ - 9^) < «-¥(T \ ^.)|(r,, (2.31) 

To prove this, we first note that 

2\\g - Aiiviir,ii=-ii/-jip < iij - Air+iir^ir -11/ - sir 

= ii3-Air+ir»ir-iis-A->-i.ir 

< 2 |(^ - fk,rk)\ = 2\{g,rk)\. 


This is the same as 



\\f-9r< 


\{9,rk)\^ 

\\9-hr 


(2.32) 
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If we write fk = with supported on S^, then the numerator of the right side 
satishes 

\{9,rk)\ =|(<I>z,rfc)| 

— l|zs^||^i||‘h*rfc||£oo 

< K-^^/4f{T\Sk)\\zs^JU2\{rk,(p^^^,)\ 

<K-W#iT \ Sk)\\z - C^\\e2\{rk,(p^^^,)\. 

On the other hand, recalling that 6 = S#(^SkUT), the denominator satishes by the RIP, 

11^7 - hr = ll$(z - c^)|p > (1 - 5)||z - cX,. (2.33) 


Therefore we have obtained 


ir.ir-ii/-^7ir< 


h{T\Sk)\{rk,^^X\‘‘ 

K?{1 — 5) 


which is dMH). 


(2.34) 

□ 


References 

[1] E. Candes, J. Romberg, T. Tao, Stable signal recovery from incomplete and inaccurate 
measurements, Comm. Pure and Appl. Math., 59(2006), 1207-1223. 

[2] A. Cohen, W. Dahmen, R. DeVore, Compressed Sensing and best k-term approxima¬ 
tion, J. Amer. Math. Soc. 22 (2009), 211-231. 

[3] R.A. DeVore, Nonlinear approximation, Acta Numerica, 7 (1998), 51-150. 

[4] S. Foucart and H. Rauhut, A Mathematical Introduction to Compressive Sensing , 
Birkhauser Verlag, Basel, 2013. 

[5] E.D. Livshitz, On the optimality of the Orthogonal Greedy Algorithm for fi-coherent 
dictionaries. Journal of Approximation Theory 164-5, 668-681, 2012. 

[6] V.N. Temlyakov, Greedy approximation, Acta Numerica, 17 (2008), 235-409. 

[7] J.A. Tropp, Greed is good: algorithmic results for sparse approximation, IEEE Trans. 
Inf. Theory, 50, 2231-2242, 2004. 

[8] T. Zhang, Sparse recovery with orthogonal matching pursuit under RIP, IEEE Trans. 
Inform. Theory, 57(9) (2011), 6215-6221. 


11 







Albert Cohen 

Laboratoire Jacques-Louis Lions, Univerisite Pierre et Marie Curie, 75005, Paris, France 
cohen@ ann .jussieu.fr 

Wolfgang Dahmen 

Institut fiir Geometrie und Praktische Mathematik, RWTH Aachen, Templergraben 55, 
D-52056 Aachen Germany 
dahmen@ igpm. rwt h- aachen. de 

Ronald DeVore 

Department of Mathematics, Texas A&M University, College Station, TH77840, USA 
rdevore@math.tamu.edu 


12 


